1 Dataset description

There are two submissions: 10267 & 10270.

  • In each submission, 2390 families with .vcf files are included.
  • For each family, two vcf files are provided,
    • one named “sorted”.
    • the other named “annotated”.

Note that combing 10267 & 10270, there are 2207 families with complete vcf files.

1.1 Submission 10267

FreeBayes is used for annotation.

  • For files named “sorted”,
    • 852 families without GL/PL information
  • For files named “annotated”,
    • 984 families without GL/PL information

Note that for FID:13562, there is no father information.



1.2 Submission 10270

FamSeq is used for annotation.

  • For files named “sorted”, there is no GL/PL information.
  • For files names “annotated”,
    • vcf files of 582 families are empty
    • vcf files of 12 families with variants < 1000.


2 Call de novo mutations

Triodenovo was used to call de novo mutations:

  • only SNP with GL/PL information was retained.
  • only trios were used (Note the family with two or more probands).
  • filters: --minDP 7 --minDepth 10 and other default options
  • Post filter:
    • Basic filtering for SNVs. The following filter will retain sites of single nucleotides with only two alleles, QUAL>=30, and mutations in which parents are homozygous references and child is heterozygote with the heterozygote PL being zero, and the minimum PL of the other two genotypes in offering is 30 (i.e. the genotype likelihood, defined as P(R|G) in which R represents the aligned bases and G is the underlying genotype, of the called het mutation is >1000 than the genotype likelihood of the other two genotypes)

The scripts are stored in /30days/uqywan67/SSC/scripts/call_deno.R


3 Annotation

ANNOVAR was used to annotate de novo mutations.